Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models
We were joined by the authors Eric Zelikman, Qian Huang, Gabriel Poesia. They gave a brief overview of their work. They also talked about some new work on automatically generating tests.
First, there were some clarification questions:
- Parsel programs are generated by LLM prompting
- pass@nxk means that n Parsel programs are generated and k total programs (in the target language) are generated
- There is some backtracking where if a function has no correct implementation, the children are reimplemented
- Lots of annoyances with Codex API on authors’ side prevented more thorough comparison
We discussed evaluation a bit:
- Someone asked how good the generated test cases were, asking what % of programs passing the generated test cases passed the final evaluation. Authors haven’t checked this yet, since the test case stuff is only a day old.
- We also discussed datasets for evaluation: there is no good evaluation for testing decompositional reasoning, all current datasets are simple, single-line functions
- Eric bets there will be one soon ;)
- It is still unclear to me how much of the improvement is due to the decompositional style of Parsel, or the fact that there are tests, or that the prompts are enhanced by “let’s think step by step to come up with a clever algorithm”. More ablations would have helped here.
We talked about UI aspects, where authors shared their thoughts on the following:
- How would a user actually use Parsel?
- How would someone debug a Parsel program?
- How would Parsel programs be shared as libraries/packages?
How could Parsel be more widely adopted?
- People don’t think of programming in terms of high-level descriptions and examples, which may pose a problem for adoptability
- Eric has a grand vision of taking large GitHub files, rewriting them into Parsel, and seeing how much easier it would be
- Gabriel finds the idea of coding in languages that are easy to read but not write very fascinating: Parsel could be a stepping stone here.
- Would people really write long programs in Parsel? How much modularity does Parsel really provide?
Are there shared components between Parsel programs?
- There is some caching, so if Parsel programs share high-level components, it will be faster
- Possibility of doing some library learning on Parsel programs?